Skip to content

ci: performance gate — smoke tests + baseline comparison#53

Merged
tig merged 2 commits into
developfrom
ci/perf-gate
May 11, 2026
Merged

ci: performance gate — smoke tests + baseline comparison#53
tig merged 2 commits into
developfrom
ci/perf-gate

Conversation

@tig
Copy link
Copy Markdown
Member

@tig tig commented May 11, 2026

Summary

Two lightweight layers that catch performance regressions and celebrate improvements without slowing CI:

Layer 1: Performance smoke tests (xUnit)

4 Stopwatch-based tests in Terminal.Gui.Editor.Tests/PerformanceSmokeTests.cs that run in the normal test suite on every CI run. Thresholds are deliberately fat (50–250x typical) so they only fail on catastrophic regressions — not CI-runner noise.

Test What it measures Typical Threshold
BuildViewport_50Lines 50-line viewport build (10K doc) ~200 µs 50 ms
BuildSingleLongLine 100× long-line build (200 chars) ~1.6 ms 10 ms
DocumentLineLookup_100K 5000 tree lookups (100K doc) ~33 µs 5 ms
FullDocumentScroll_1K Full scroll simulation (1K lines) ~4 ms 200 ms

Layer 2: Benchmark baseline comparison (CI step)

A new CI step (Performance check, Ubuntu only) that:

  1. Runs VisualLineBuildBenchmarks (ShortRun, ~30s)
  2. Compares results to benchmarks/baseline.json
  3. Posts a markdown comparison table to the GitHub step summary
  4. Fails CI if any benchmark exceeds 3x baseline (egregious regression)
  5. Celebrates 🎉 if any benchmark drops below 0.8x baseline (nice improvement)

Updating the baseline

After a deliberate performance change (optimization or known cost increase):

# Re-run and update baseline.json with new numbers
dotnet run --project benchmarks/Terminal.Gui.Editor.Benchmarks -c Release -- --filter "*VisualLineBuild*"
# Edit benchmarks/baseline.json with the new means, commit

Test plan

  • dotnet build Terminal.Gui.Text.slnx succeeds
  • All 57 editor tests pass (53 existing + 4 new smoke tests)
  • dotnet format --verify-no-changes clean
  • Smoke tests complete in <1s total

🤖 Generated with Claude Code

tig and others added 2 commits May 11, 2026 06:59
Two layers that catch regressions without slowing CI:

1. PerformanceSmokeTests (xUnit, runs in normal test suite):
   - Stopwatch-based with fat thresholds (50–250x headroom)
   - Catches catastrophic regressions only
   - 4 tests: viewport build, long-line build, 100K-line tree
     lookup, full 1K-line scroll

2. Benchmark baseline comparison (CI step, Ubuntu only):
   - Runs VisualLineBuild benchmarks (ShortRun, ~30s)
   - Compares to benchmarks/baseline.json
   - Fails CI if any benchmark > 3x baseline (regression)
   - Celebrates in step summary if any < 0.8x baseline (improvement)
   - Results posted to GitHub step summary as markdown table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI runners (shared, no turbo) are 2–4x slower than local M-series.
The 10ms threshold was too tight — Ubuntu hit 23ms, macOS 38ms,
Windows 20ms. Bump to 100ms to keep fat headroom.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tig tig merged commit 27d0b7e into develop May 11, 2026
6 checks passed
@tig tig deleted the ci/perf-gate branch May 11, 2026 13:33
tig added a commit that referenced this pull request May 12, 2026
Split performance work into its own csproj and CI workflow so the
correctness-focused CI stays fast across all three OSes and the perf
gate stops being a silent no-op.

New layout

  tests/Terminal.Gui.Editor.PerformanceTests/
    PerformanceSmokeTests.cs            (moved from Editor.Tests/)
    Terminal.Gui.Editor.PerformanceTests.csproj

  .github/workflows/perf.yml             (ubuntu-latest only)
    - Release build
    - Run PerformanceTests (stopwatch smoke tests)
    - Run benchmarks/compare-baseline.sh (VisualLineBuild gate)
    - workflow_dispatch with `full-suite: true` runs the full
      BenchmarkDotNet matrix and uploads results as an artifact —
      the operator path for refreshing baseline.json (#78).

  .github/workflows/ci.yml
    - Perf step removed; comment points to perf.yml.

Why a separate workflow
  - Windows / macOS GitHub-hosted runners share hosts with neighbour
    VMs; wall-time assertions there are too noisy to gate on. Linux
    runners are still noisy but consistent enough for a 3× threshold.
  - The full BDN suite takes minutes; CI for correctness needs to be
    fast. Per-PR perf only runs the focused VisualLineBuild filter.

Fix while we're here: compare-baseline.sh used `--job ShortRun`,
which BenchmarkDotNet rejects ("invalid base job"). BDN exited
without running any benchmarks, the script saw no JSON report,
warned "skipping comparison", and exited 0. So the perf gate has
been a silent no-op since PR #53 — neither the >3× fail nor the
<0.8× celebrate could ever fire (see issue #78, PR #77 didn't
trigger the celebration for exactly this reason). Switched to
`--job short` (the lowercase form BDN accepts) and added a comment
documenting the history.

Tests on this branch (local Release):
  Text.Tests:        230 passing
  Editor.Tests:       87 passing  (was 91; 4 perf tests moved out)
  IntegrationTests:  108 passing
  PerformanceTests:    4 passing  (new project)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant